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Abstract: 

This  paper  addresses  the  problem  of  building  conceptual  resources  for  multilingual  applications.  We  describe 
new  techniques  for  large-scale  construction  of  a  Chinese-English  lexicon  for  verbs,  using  thematic-role  infor¬ 
mation  to  create  links  between  Chinese  and  English  conceptual  information.  We  then  present  an  approach 
to  compensating  for  gaps  in  the  existing  resources.  The  resulting  lexicon  is  used  for  multilingual  applications 
such  as  machine  translation  and  cross-language  information  retrieval. 
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1  Introduction 

With  the  advent  of  the  web  and  increasingly  i 
global  interconnectivity,  the  need  for  online  mull  [ 
gual  information  has  grown  significantly  in  the 
5-10  years.  This  is  accompanied  by  a  growing  : 
for  rapid  construction  of  lexical  resources.  Crea 
resources  by  human  labor  alone  has  become  inf 
ble,  thus  motivating  the  development  of  an  ton 
and  semi-automatic  approaches  to  resource  acquisi¬ 
tion.  This  paper  addresses  large-scale  construction 
of  a  Chinese-English  lexicon  for  verbs,  including  an 
approach  to  compensating  for  gaps  in  the  existing 
resources. 
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Figure  1;  Relation  Between  Existing  Resources  and 
New  Mappings 


The  lexicons  resulting  from  our  acquisition  ap¬ 
proach  are  used  for  semantic  analysis  in  applica¬ 
tions  such  as  machine  translation  and  cross-language 
information  retrieval.  The  importance  of  semantic 
analysis  in  either  of  these  two  applications  is  clear 
when  one  considers  the  degree  of  inaccuracy  that 
might  result  from  using  a  weak  alternative,  such  as 
access  to  a  bilingual  word  list. 

Our  starting  point  is  an  existing  classification  of 
English  verbs  called  EVCA  (English  Verbs  Classes 
and  Alternations)  (Levin,  1993).  We  couple  this 
with  a  Chinese  conceptual  database  called  HowNet 
(Zhendong,  1988a;  Zhendong,  1988b;  Zhendong, 
1988c)  (http ;//www. how-net. com),  from  which  we 
extract  thematic-role  information  (e.g.,  a  mapping 
between  the  HowNet  “Patient”  and  the  EVCA- 
based  “Th(eme)”)  to  create  links  between  Chinese 
and  English  conceptual  information.  HowNet  cur¬ 
rently  contains  no  English  translations;  thus,  we  also 
use  a  large  machine-readable  Chinese-English  dic¬ 
tionary  called  Optilex  to  produce  candidate  English 
translations.^  Although  later  versions  of  HowNet 
are  expected  to  include  the  English  translations, 
these  are  not  openly  available — only  the  binary  ver¬ 
sions  have  been  promised  and  these  will  be  accessible 
solely  through  the  use  of  (purchasable)  HowNet  soft¬ 
ware.  Moreover,  we  expect  our  techniques  to  be  gen¬ 
erally  applicable  to  other  foreign  language  semantic 
hierarchies  where  English  translations  are  not  avail¬ 
able.  We  predict  this  will  occur  more  and  more  fre¬ 
quently,  as  online  (non-bilingual)  linguistic  resources 
continue  to  be  made  available  in  multiple  languages. 

Several  researchers  have  investigated  the  prob¬ 
lem  of  assigning  class-based  senses  to  verbs  (Dorr, 
1997;  Palmer  and  Rosenzweig,  1996;  Palmer  and 
Wu,  1995)  using  a  variety  of  online  resources  includ¬ 
ing  Longman’s  Dictionary  of  Contemporary  English 

^Optilex  is  a  large  (600k  entries)  machine-readable 
version  of  the  CETA  Chinese-English  dictionary,  licensed 
from  the  MRM  corporation,  Kensington,  MD. 


(LDOCE)  (Procter,  1978),  EVCA  (Levin,  1993), 
and  WordNet  (Miller  and  Fellbaum,  1991).  Trans¬ 
lation  of  English  classes  into  other  languages  has 
proven  difhcult  (Jones  et  ah,  1994;  Nomura  et  ah, 
1994;  Saint-Dizier,  1996),  but  regularities  between 
different  language  classifications  can  be  found  in 
some  online  resources  (Dang  et  ah,  1998;  Dorr  and 
Jones,  1999;  Olsen  et  ah,  1998). 

This  work  extends  previous  work  which  used  a 
concept  space  to  produce  a  hierarchical  organization 
of  Chinese  verbs  (Palmer  and  Wu,  1995).  We  adopt 
a  technique  that  is  similar  in  flavor  to  that  of  (Dang 
et  ah,  1998)  for  partitioning  English  verbs  into  re¬ 
fined  classes  using  WordNet,  with  the  following  ex¬ 
tensions;  (1)  The  use  of  the  entire  EVCA  database 
rather  than  a  small  set  of  verbs  (the  break  class); 
(2)  The  provision  of  a  thematic-role  based  filter  for 
a  more  refined  version  of  verb-class  assignments;  (3) 
Concept  alignment  across  two  different  language  hi¬ 
erarchies  (Chinese  and  English);  and  (4)  Mappings 
between  Chinese  and  English  thematic  roles. 

This  work  relies  on  an  augmented  set  of  EVCA 
classes  which  include  26  new  classes  (Dorr,  1997). 
There  are  500  total  classes  in  the  extended  set, 
each  hand-tagged  with  semantic  representations, 
thematic-role  information,  and  WordNet  synset 
numbers.  We  will  demonstrate  that  it  is  possible 
to  produce  a  lexicon  by  associating  709  Chinese 
HowNet  concepts  with  500  EVCA  classes,  with  a 
clear  concept-to-class  correspondence  in  a  large  ma¬ 
jority  of  the  cases. ^ 

Figure  1  illustrates  the  relation  between  existing 
resources  and  the  mappings  we  produced.  Solid 
lines  represent  pre-existing  mappings;  dotted  lines 
are  ones  resulting  from  the  application  of  our  tech- 

^HowNet  contains  815  verb  HowNet  concepts  alto¬ 
gether.  However,  we  are  not  inclnding  the  106  HowNet 
concepts  that  are  not  associated  with  any  Chinese  words; 
these  are  “higher  level”  conceptnal  nodes  with  no  Chi¬ 
nese  realization  (e.g.,  V.l  |static|). 


1 


niques.  The  most  critical  of  these  is  the  one  labeled 
6*- roles  (shorthand  for  “thematic  roles”),  which  as¬ 
sociates  EVCA  classes  with  HowNet  Concepts.  The 
remaining  two  dotted-line  mappings  are  “transitive 
closure”  biproducts  of  the  other  mappings;  Once 
the  thematic-role  mapping  associates  EVCA  verbs 
with  HowNet  verbs,  each  HowNet  verb  is  associ¬ 
ated  with  Optilex-based  English  glosses  (transla¬ 
tions)  and  WordNet  1.6  Senses. 

We  will  describe  how  these  correspondences  are 
derived  and  we  will  show  how  this  process  has  pro¬ 
vided  a  framework  for  compensating  for  gaps  in  our 
online  resources. 

2  Multilingual  Applications 

The  semantic  representations  produced  semi- 
automatically  for  our  multilingual  resources  are  used 
in  machine  translation  (MT)  and  cross-language  in¬ 
formation  retrieval  (CLIR)  applications.  Both  appli¬ 
cations  rely  on  the  use  of  a  parser  for  mapping  the 
input  sentence  into  a  syntactic  tree.  The  parser  out¬ 
put  is  semantically  analyzed,  producing  an  encoding 
of  semantic  and  argument-structure  information. 

The  MT  approach  is  interlingual,  where  the 
target-language  lexicon  is  searched  for  appropriate 
lexical  items  matching  argument-structure  informa¬ 
tion  (Dorr  et  ah,  1998).  A  screen  snapshot  of  a 
MT  example  is  shown  in  Eigure  2.  The  CLIR  ap¬ 
proach  relies  on  the  same  interlingual  representation 
to  transform  a  user’s  query  into  the  document  lan¬ 
guage  for  information  retrieval  (Dorr  and  Katsova, 
1998;  Levow  et  ah,  2000). 

In  both  of  these  applications,  thematic  roles  fa¬ 
cilitate  the  selection  of  appropriate  target-language 
words.  Eor  example,  the  Chinese  verb  Jr  (la)  cor¬ 
responds  to  a  wide  range  of  English  translations — 
even  if  we  examine  only  the  verb  translations; 
slash,  cut,  chat,  pull,  drag,  transport,  move, 
raise,  kelp,  implicate,  involve,  defecate,  press- 
gang  f  Our  approach  provides  a  framework  for 
disambiguation  of  such  cases.  Certain  of  these 
possibilities — transport  and  move — are  analyzed  as 

^The  Chinese  verbs  are  additionally  associated  (for 
free)  with  WordNet  senses  from  our  previously  tagged 
EVCA  verbs.  More  details  are  given  in  (Dorr  et  ah, 
2000). 

“'The  ambiguity  in  the  word  JR  (la)  can  often  be 
resolved  if  it  is  combined  with  other  characters.  For 
example,  m  (la  che)  unambiguously  means  pull  a 
cart.  However,  since  object  dropping  is  a  frequently  phe¬ 
nomenon  in  Chinese,  it  is  not  uncommon  for  verbs  like 
’la’  to  appear  without  an  argument  that  easily  disam¬ 
biguates  the  word.  Thus,  our  approach  must  allow  for 
multiple  possibilities  in  the  lexicon. 


Eigure  2;  Translation  of  a  Chinese  sentence  into  En¬ 
glish 

1.  Associate  English  Optilex  glosses  with  all  12342 
Chinese  verbs  in  HowNet,  producing  41,324  Chinese- 
English  pairs. 

2.  Associate  each  verb-to-concept  candidate  with  at 
least  one  of  the  500  EVCA  classes.^ 

3.  For  each  HowNet  concept,  partition  the  associ¬ 
ated  Chinese-English  pairs  into  groups  whose  English 
glosses  correspond  to  EVCA  classes. 

Figure  3;  Mapping  Chinese  HowNet  Concepts  to  En¬ 
glish  EVCA  Classes 

one  semantic  representation  corresponding  to  the¬ 
matic  roles  (agent, theme, goal, source).  Other 
possibilities — help — are  analyzed  as  a  different  se¬ 
mantic  representation  corresponding  to  thematic 
roles  (agent, theme,mod-poss). 

3  Mapping  Between  Chinese 
HowNet  and  English  EVCA 

Our  technique  for  mapping  between  Chinese 
HowNet  concepts  and  English  EVCA  classes  in¬ 
volves  associating  HowNet  thematic  roles  with 
those  in  EVCA.  Each  HowNet  concept  (and  each 
EVCA  class)  is  paired  with  a  list  of  thematic 
roles,  which  we  call  a  thematic  grid.  For 
example,  the  HowNet  concept  |Cure|  is  paired 
with  the  grid  (agent , patient , content , tool) ,  as 
in  The  doctor(agent)  cured  the  man(patient)  of 
pneumonia(content)  using  antibiotics(tool) .  The 
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corresponding  grid  in  our  EVCA  database  is 
( ag ,  th ,  mod-pos s  ( of ) ) .  Although  the  HowNet  and 
EVCA  roles  are  not  in  a  one-to-one  correspondence, 
they  can  still  be  used  for  a  “closest  match”  prioriti¬ 
zation  of  candidate  HowNet-EVCA  associations,  as 
we  will  see  shortly. 

The  three  top-level  tasks  involved  in  mapping  Chi¬ 
nese  HowNet  concepts  to  and  English  EVCA  classes 
are  given  in  Eigure  3.  (See  (Dorr  et  ah,  2000)  for 
more  details.)  Eor  the  purposes  of  this  discussion, 
we  focus  on  the  last  of  these  three  tasks,  which  in¬ 
volves  a  massive  filtering  of  spurious  class  assign¬ 
ments.  This  task  involves  three  steps; 

•  Order  the  candidate  EVCA  classes  so  that  the 
highest-ranking  classes  are  those  that  contain 
the  highest  number  of  English  verbs  matching 
the  Optilex  glosses. 

•  In  cases  where  a  tie-breaker  is  needed,  reorder 
the  candidate  EVCA  classes  according  to  the 
degree  to  which  the  thematic  grid  in  HowNet 
concept  matches  that  of  the  relevant  EVCA 
class.  The  matching  procedure  relies  on  cor¬ 
relations  derived  from  approximately  200  seed 
mappings.®  Eigure  4  shows  a  small  subset  of 
these  mappings. 

•  Eor  each  Chinese-English  entry  associated  with 
the  HowNet  concept,  assign  the  highest  ranking 
candidate  EVCA  class. 

Consider  the  case  of  the  multiply  ambiguous 
Chinese  verb  Jii  (la).  Two  of  the  HowNet  con¬ 
cepts  associated  with  this  verb  are  |Help|  and 
|Transport|.  The  thematic  grid  associated  with 
|Help|is  (agent  .patient ,  scope)  (as  in  John  helped 
him  with  his  work).  This  grid  most  closely 
matches  that  of  the  Equip  EVCA  Class  (where 
Jii  (la)  is  translated  as  help)  which  has  the  grid 
_ag_th.,mod-poss  (with.) ;  thus,  the  |Help|  HowNet 
concept  is  associated  with  the  Equip  EVCA  Class, 
and  the  mapping  between  the  two  is  (agent->ag), 
(patient->th) ,  (scope->mod-poss) . 

On  the  other  hand,  the  |Transport|  HowNet 
concept  is  associated  with  the  thematic  grid 
(agent,  patient,  Locationini,  LocationFin, 
direction)  (as  in  John  transported  the  goods 
from  Boston  to  New  York  (westward)).  This  grid 
most  closely  matches  that  of  the  Send  EVCA 
Class  (where  Jii  (la)  is  translated  as  transport)’, 

®The  seed  mappings  were  done  by  hand  at  a  rate  of 
approximately  50  mappings  per  hour;  these  were  verihed 
by  a  native  Chinese  speaker  in  a  half  day. 


thus,  the  |Transport|  HowNet  concept  is  associ¬ 
ated  with  the  Send  EVCA  class,  and  the  mapping 
between  the  two  is  (agent->ag),  (patient->th.) , 
(LocationIni->src) ,  (LocationFin->goal) .  The 
end  result  is  that  the  English  glosses  associated  with 
Jli  (la)  are  filtered  down  to  help  in  the  EVCA’s 
Equip  class  and  transport  in  EVCA’s  Send  class;  the 
corresponding  semantic  representations  are  assigned 
from  the  EVCA  database. 

The  massive  filtering  of  spurious  assignments  is 
evident  when  we  examine  each  individual  HowNet 
concept.  Consider  the  |Estabhsh|  HowNet  concept. 
This  concept  is  ultimately  associated  with  only  two 
EVCA  classes,  29. 2. c  and  26. 4. a  (Characterize  and 
Create),  but  it  initially  had  29  potential  EVCA  class 
assignments.  One  EVCA  class  that  was  ruled  out 
is  the  Change  of  State  class,  45. 4. a,  associated  with 
the  Optilex  translation  colonize  for  the  Chinese  verb 
SK  (zhimin).  Although  this  is  a  perfectly  valid 
EVCA  class  assignment  for  the  HowNet  concept 
I  Colonize  I,  it  is  not  appropriate  for  the  |  Establish! 
HowNet  concept.  Because  this  class  is  ranked  8th  for 
lEstablishj — as  opposed  to  1st  and  2nd  place  ranking 
for  29. 2. c  and  26. 4. a,  respectively — this  assignment 
is  ruled  out  by  our  algorithm. 

4  Compensating  for  Resource 
Deficiencies 

As  part  of  our  effort  to  produce  a  complete  align¬ 
ment  between  HowNet  and  EVCA,  we  built  an 
EVCA-based  canonical  specification  for  each  of  the 
709  HowNet  concepts  so  that  we  could  compensate 
for  certain  types  of  resource  deficiencies.  The  canon¬ 
ical  specification  consists  of  an  EVCA  class  coupled 
with  its  associated  prototype  verb.  These  canonical 
specifications  provide  a  mapping  between  a  HowNet 
concept  and  an  EVCA  class/prototype- verb  pair. 

Each  canonical  specification  was  automatically 
generated  according  to  the  highest  ranking  EVCA 
class  using  steps  3. a  and  3.b  in  Section  2.  All  such 
specifications  were  hand-verified  (at  a  rate  of  80  per 
hour  for  709  classes).  In  most  cases,  the  prototype 
verb  names  the  HowNet  concept,  e.g.,  transport  for 
the  jTransportj  HowNet  concept.  In  other  cases — 
where  the  HowNet  concept  is  not  an  English  word — 
the  prototype  word  is  a  realization  of  that  concept, 
e.g.,  belittle  for  the  jPlayDownj  HowNet  concept.  A 
sample  of  the  canonical  specifications  is  given  in  Eig¬ 
ure  5. 

We  use  these  canonical  specifications  to  compen¬ 
sate  for  gaps  that  arise  in  our  three  online  resources; 
(1)  EVCA,  (2)  Optilex,  and  (3)  HowNet. 
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Hownet 

1  EVCA-Based  Roles  | 
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0 
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0 
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20 
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3 

1 
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13 

32 

33 

0 
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0 
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0 
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0 
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7 

7 

0 
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0 
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0 

0 

source 

0 

4 

0 

0 

16 

0 

0 

0 

0 

0 
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0 

0 

0 

1 

target 

0 

7 

12 

27 

1 

17 

0 

0 

0 

3 

0 

2 

0 

0 

1 

Figure  4;  Seed  Table  for  mapping  HowNet  Roles  into  EVCA  Roles 


HowNet  Concept 

Canonical  Specification 

|'l'ransport| 

11.1  Send,  transport 

|BeNot| 

22. 2. a  Amalgamate,  oppose 

IHelpI 

13.4.2  Eqnip,  help 

1  Moisten  1 

45. 4. a  Change  of  State,  facilitate 

1  Excrete  1 

40.1.2  Breathe,  bleed 

|Apologize| 

32. 2. a  Long,  apologize 

|PlayDown| 

33. b  Jndgment,  belittle 

1 N  aming  | 

29.3  Dnb,  name 

|Choose| 

29. 2. c,  choose 

|Announce| 

37. 7. b  Say,  announce 

|Mean| 

37. 7. a  Say,  signify 

|Communicate| 

37. 9. c  Advise  inform 

Figure  5;  Sample  of  Canonical  Specifications  for  Fill¬ 
ing  Resource  Gaps 


4.1  EVCA  Gaps 

An  EVCA  gap  is  detected  when  an  Optilex  verb 
gloss  for  a  Chinese  verb  does  not  occur  in  EVCA. 
When  this  occurs,  the  canonical  specification  for 
the  Chinese  verb  is  automatically  used  to  assign  the 
verb  an  appropriate  EVCA  class.  Eor  example,  one 
Optilex  gloss  associated  with  the  HowNet  concept 
|Establish|  (for  the  verb  (chongjian))  is  recon¬ 

struct^  which  does  not  occur  in  EVCA.  Our  tech¬ 
nique  associates  this  Chinese  verb  with  the  canonical 
specification  “29. 2. c  Characterize,  establish”  and 
the  Chinese  verb  is  then  linked  with  the  word  sense 
associated  with  establish. 

An  interesting  byproduct  of  the  handling  of  EV C  A 
gaps  is  that  it  allows  us  to  enhance  our  EVCA  re¬ 
source.  Eor  example  the  verb  reconstruct  can  now 
be  added  to  EVCA  Class  29. 2. c,  on  a  par  with  the 
previously  classified  EVCA  verb  establish. 


4.2  Optilex  Gaps 

An  Optilex  gap  occurs  when  a  particular  translation 
for  a  Chinese  verb  is  missing.  Eor  example,  the  verb 
(baibu)  has  only  one  Optilex  gloss;  manipu¬ 
late.  However,  the  word  is  associated  with  two 

HowNet  concepts,  |Decorate|  and  |Control|.  This 
gloss  is  only  appropriate  for  the  |Control|  concept. 
The  decorate  meaning  of  (baibu)  is  omitted  in 


Optilex. 


Such  gaps  are  detected  by  means  of  two  types  of 
information;  (1)  HowNet  and  EVCA  thematic  grid; 
and  (2)  correlations  between  the  gloss  under  ques¬ 
tion  and  other  HowNet  concepts.  In  this  particular 
example,  the  thematic  grid  for  manipulate  in  EVCA 
is  (ag,exp,instr),  which  is  ranked  low  (11th  out 
of  28)  with  respect  to  the  roles  (agent  .patient) 
associated  with  the  HowNet  |Decorate|  concept.  By 
contrast,  this  same  EVCA  class  has  a  high  rank¬ 
ing  (2nd  out  of  22)  with  respect  to  the  HowNet 
|Control|  concept  due  to  a  close  match  between 
(ag.exp.instr)  and  the  HowNet  thematic  roles 
(agent .patient .ResultEvent) .  In  addition,  the 
correlation  of  the  gloss  manipulate  is  much  higher  for 
HowNet’s  |Control|  concept  than  it  is  for  HowNet’s 
|Decorate|  concept  (4  occurrences  compared  to  0). 
Erom  these  two  types  of  information,  we  can  con¬ 


clude  that  the  decorate  sense  of  (baibu)  is  miss¬ 
ing  from  Optilex.  As  in  the  case  with  EVCA  gaps, 
our  technique  associates  the  Chinese  verb  with  the 
canonical  specification  “9.8.b  Eill,  decorate”  to  com¬ 
pensate  for  this  Optilex  gap. 


In  addition  to  their  usefulness  in  handling  of  gaps 
in  our  lexical  resources,  the  canonical  specifications 
proved  useful  for  assigning  EVCA  classes  to  Chi¬ 
nese  verbs  whose  Optilex  gloss  was  not  “parsable” 
by  our  gloss  extraction  procedure.  Eor  example,  the 
Chinese  verb  (aida)  has  only  a  single  Optilex 

translation;  take  a  beating.  This  verb  is  associ¬ 
ated  with  the  HowNet  concept  |Suffer|,  which  has  as 
its  canonical  specification  “31. 3. d  Marvel,  suffer.” 
Thus,  our  technique  associates  verb  with  this 

canonical  specification. 


A  similar  approach  is  used  for  unknown  or  mis¬ 
spelled  words.  Eor  example,  the  translation  of 
(shusong)  as  in  Optilex  is  misspelled  as  tran- 
port.  Because  this  verb  is  associated  with  HowNet’s 
|Transport|  concept,  we  associated  this  verb  with  the 
canonical  specification  “11.1  Send,  transport.” 
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4.3  HowNet  Gaps 

In  some  cases,  the  HowNet  hierarchy  incorrectly 
associates  a  Chinese  word  with  a  particular  con¬ 
cept.  For  example,  HowNet  incorrectly  as¬ 
sociates  the  two  Chinese  verbs  (zhahua) 

and  (xiuhua)  with  the  |Decorate|  concept. 

These  two  verbs  are  translated  as  embroider  in 
EVCA  class  26.1.b  (Build),  but  their  meaning 
is  closer  to  sew  flowers.  That  is,  the  pa¬ 
tient  is  incorporated  into  the  verb,  which  means 
the  thematic  grid  _ag_th._goal(into)  ,ben(f or) 
does  not  match  that  of  the  HowNet  concept 
(agent .possession, source) . 

Discrepancies  in  HowNet  are  detected  by  means  of 
EVCA-class  frequency  for  a  particular  HowNet  con¬ 
cept.  Out  of  the  17  verbs  associated  with  HowNet’s 
|Decorate|  concept,  only  two  of  them  (the  two  mis- 
categorized  Chinese  verbs)  are  associated  with  an 
EVCA  class  that  is  not  9.9  or  9.8.  As  in  the  gap- 
recovery  described  approaches  above,  our  technique 
associates  the  miscategorized  verbs  with  the  canon¬ 
ical  specification  “9.8.b  Eill,  decorate.”^ 

5  Results 

Preliminary  results  of  our  classification  scheme  were 
reported  in  (Dorr  et  ah,  2000).  This  earlier  work 
resulted  in  8089  EVCA-classified  Chinese  entries — 
about  43%  of  the  number  of  potential  entries.  The 
remaining  10441  entries  were  accounted  for  through 
the  compensation  techniques  described  above.  Us¬ 
ing  the  canonical  specifications,  we  have  achieved  a 
more  refined  EVCA-to-HowNet  mapping,  providing 
an  increase  in  EVCA-classified  Chinese  words  from 
the  previous  8089  entries  to  the  current  expanded 
set  of  17284  EVCA-classified  Chinese  words.  The 
histogram  in  Eigure  6  characterizes  the  number  of 
EVCA  classes  required  for  coverage  of  709  HowNet 
concepts. 

Examples  of  the  HowNet  partitionings  into  EVCA 
classes  are  given  in  Eigure  7,  with  a  focus  on  the 
cases  where  1  partition  was  found.  Percentages  are 
given  with  respect  to  the  number  of  Chinese  verbs 
associated  with  each  EVCA  class. 

We  consider  the  approach  to  be  a  success  for  sev¬ 
eral  reasons;  (1)  In  359  cases  (50%  of  the  HowNet 
concepts),  the  partitioning  corresponded  to  3  or 
fewer  EVCA  classes;  (2)  Most  HowNet  concepts  with 
2  or  more  partitions  had  a  very  heavy  association 
with  a  single  EVCA  class  (60%  or  higher),  with 

®  Ultimately,  the  miscategorized  verbs  should  be  dis¬ 
associated  from  the  HowNet  concept,  but  there  is  cur¬ 
rently  no  way  to  tease  apart  such  cases  from  the  Optilex 
gaps.  Thus,  the  two  are  treated  identically. 


most  other  partitions  falling  around  20%  or  lower; 
(3)  Only  2  cases  did  not  correspond  to  any  EVCA 
class  (i.e.,  degenerate  HowNet  concepts  for  which  no 
correlations  with  EVCA  could  be  found);  (4)  There 
were  virtually  no  partitionings  (a  handful  of  single 
HowNet  concepts)  exceeding  13  EVCA  classes. 

6  Summary  and  Future  Work 

We  have  presented  an  approach  to  aligning  two 
large-scale  online  resources,  HowNet  and  EVCA. 
The  lexicon  resulting  from  this  approach  is  large- 
scale,  containing  18530  Chinese  entries.  The  tech¬ 
nique  for  producing  these  links  involves  matching 
thematic  grids  in  HowNet  with  those  in  EVCA.  Our 
results  indicate  that  the  correspondence  is  very  high 
between  the  709  Chinese  HowNet  concepts  and  the 
500  EVCA  classes.  We  see  our  techniques  as  the  first 
step  toward  a  general  approach  to  building  reposi¬ 
tories  for  interlingual-based  NLP  applications. 

We  are  currently  investigating  the  use  of  the 
lexicon  for  word-sense  disambiguation  in  machine- 
translation  and  cross-language  information  retrieval. 
As  we  saw  above  the  Chinese  verb  Jit  (la)  has  sev¬ 
eral  possible  translations,  but  not  all  of  these  will  be 
appropriate  in  every  context.  If  we  can  determine 
which  HowNet  concept  corresponds  to  Jtl  (la),  then 
we  will  translate  it  appropriately.  Eor  example,  if 
the  HowNet  concept  is  |Transport|,  the  translation 
would  be  skip  or  transport,  but  not  slash,  chat,  im¬ 
plicate,  etc.  We  can  detect  which  HowNet  concept 
is  appropriate  by  examining  the  other  words  in  the 
sentence.  If  those  words  co-occur  with  other  Chinese 
verbs  associated  with  a  particular  HowNet  concept 
(as  determined  through  a  corpus  analysis),  then  it 
is  likely  that  that  HowNet  concept  is  the  appropri¬ 
ate  one  for  the  Chinese  verb.  That  is,  if  we  find 
other  verbs  from  a  given  HowNet  concept  occurring 
in  the  same  context,  then  we  can  hypothesize  that 
this  particular  verb  has  the  meaning  of  this  HowNet 
concept. 

The  algorithm  for  mapping  between  HowNet  con¬ 
cepts  and  EVCA  classes  requires  a  “training”  step — 
i.e.,  the  seed  mappings  given  earlier.  However, 
it  is  possible  to  produce  a  ranked  mapping  be¬ 
tween  thematic  grids  by  counting  correspondences 
between  EVCA-based  roles  and  the  HowNet-based 
roles  across  the  entire  concept  space.  This  approach 
is  also  currently  under  investigation. 

Another  area  of  investigation  is  the  use  of 
a  WordNet-based  distance  metric  (e.g.,  the 
information-content  approach  of  (Resnik,  1995))  for 
additional  pruning  power  in  the  HowNet-to-EVCA 
alignment.  Because  each  of  the  entries  in  the  EVCA 
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#  of  Houinet  Concepts 


Figure  6;  Distribution  of  HowNet  Concepts  by  Number  of  Intersecting  EVCA  Classes  using  Canonical 
Specifications 


HowNet  Concept 

EVCA  Class(es) 

|Transport| 

11.1  Send 

13.4.2  Equip 

|Apologize| 

32. 2. a  Long 

1 N  aming  | 

29.3  Dub 

1  Judge  1 

29.4  Declare 

1  Moisten  1 

45. 4. a  Change  of  State 

1  Excrete  1 

40.1.2  Breathe 

|TakeVehicle| 

51.4.2.a.ii  Motion  by  Vehicle 

|PlayDown| 

33. b  Judgment  (75%),  31. 2. a  Admire  (25%) 

1  Establish! 

29. 2. c  Characterize  (90%),  26. 4. a  Create  (19%) 

1  Decorate] 

9.8.b  Fill  (50%),  26.1.b  Build  (43%),  9.9.ii  Butter  (25%) 

|'i'each| 

29. 2. c  Characterize  (24%),  33. b  Judgment  (71%),  37. 9. a  Advise  (29%),  37.1. a 
Transfer  Message  (45%),  31.1. a  Amuse  (19%) 

Figure  7;  Examples  of  HowNet  Partitionings  with  Respect  to  EVCA 
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classification  is  associated  with  a  WordNet  sense 
(Miller  and  Fellbaum,  1991),  it  is  possible  to  rule 
out  certain  class  assignments  for  a  given  HowNet 
concept  by  examining  semantic  distance  between  the 
Optilex  glosses  for  a  particular  Chinese  word  and  the 
glosses  for  other  words  associated  with  that  concept. 
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